Experimenting with Dask integration #208
Draft
+103
−4
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
After the Dask discussion two weeks ago (see https://github.com/orgs/xpublish-community/discussions/4) I sat down and sketched out what an implementation could look like in Xpublish. It's really rough, and throughly un-tested.
This adds two local plugins and associated infrastructure for most hooks to be able to use Dask.
In most cases for different types of Dask infrastruture, a plugin that provides a
get_dask_cluster()
method should do the trick. The hook is set up to only return one result, and the built in plugin will be the last.The Dask client plugin in theory should work with different types of clusters, but is similarly set up to be able to be overridden (dask-on-ray?). The client can be both sync and async, and once it gets accessed, it's cached on
xpublish.Rest
.For hooks that have access to
deps
(which now includes dataset providers),deps.dask_sync_client
anddeps.dask_async_client
now should give you the client.The async client may need to be passed the current event loop. It appears the way to access the event loop varies by server, so that will probably take some research.